NMR Ensembles of Models

Caveat
I wrote the initial content for this page based on discussions I've had with NMR spectroscopists and crystallographers. Be warned that I am far from an expert, and have never done either NMR spectroscopy nor crystallography. Until this page is vetted by an expert, its content should be considered provisional, not authoritative! Eric Martz 01:25, 26 June 2008 (IDT)

Structure Determination by NMR
About 14% of the entries in the Protein Data Bank were determined by nuclear magnetic resonance in solution (NMR) as of mid-2008. 85% were determined by X-ray crystallography, and <1% by other methods. NMR can only be used for relatively small macromolecules (see below).

The primary data yielded by NMR analysis is mostly local and more recently global geometric information about atoms within the structure. Typically, these include distance between pairs of atoms, dihedral angles (typically backbone φ angles and some side-chain χ1 angles) and sometimes global information such as the orientation of a given bond with respect to a fixed axis of the molecule. These data are used as "restraints" to reconstruct 3D models which are compatible with the NMR data. All calculations are performed directly in the physical space, starting with a random conformation of the macromolecule, which is progressively folded to satisfy the restraints. Typically, several runs are performed, starting from different initial conformations, in order to check that the calculation converges onto a single solution. The result is thus an ensemble of models, the distribution of which gives a measure of the precision of the NMR structure.

Model building for NMR experiments typically starts with the complete protein or nucleic acid chain, including hydrogen atoms. The distance restraints are then applied. The resulting model usually includes the entire protein and nucleic acid chains, unlike X-ray crystallographic models that often lack the ends, and even loops in the middle of chains, due to disorder in protein crystals.

Macromolecular structure determination by NMR is done in aqueous solution, and thus requires that the molecule be soluble. For more information, see Nature of 3D Structural Data and NMR in Wikipedia.

Display of NMR Models by Proteopedia
Proteopedia shows all the models in ensembles of models from NMR experiments. This enables you to see where the models agree with each other, and where they differ. Each model is shown as a thin backbone trace (a line connecting alpha carbon atoms of amino acids, or phosphorus atoms in DNA or RNA chains). The backbone traces are colored by Amino to Carboxy "rainbow", a spectral sequence of colors starting at the amino terminus (or 5' terminus of nucleic acid chains) and ending at the carboxy terminus (or 3' terminus).

Ligands (Hetero atoms) are also shown for all models, except that they are opaque only for model 1, and translucent for all other models. Ligand atoms are colored by element, using the CPK color scheme. Examples with hetero groups covalently linked to chain termini, with extremely variable positions, are 1jsa and 1dqc. 1bah also has hetero groups in variable positions. 1hpn has only hetero atoms.

The example at right shows the 3 models for 1lcd, a lac repressor domain bound to DNA, with one sodium ion. Water is present in this model, but for clarity, Proteopedia does not show water in its initial scene. Show water. (To hide water, click the initial scene green link just below the molecule.)

Disulfide bonds are shown as yellow rods connecting backbones, with the first model opaque, and all other models translucent. An example is 1iw4.

Individual Models
In order to view individual models, click on Jmol (lower right corner below the molecule) to open Jmol's menu. There, use the All N models item (where N is the total number of models in the ensemble). For example, clicking on 1.1: 1 will display only model 1, and the menu will now say model 1/N. You can also use Jmol's menu to change the rendering and coloring.

Animating NMR Ensembles
When the models in an NMR ensemble are played like a movie, the resulting animation simulates thermal motion (although not all the motions are necessarily real -- see below). In order to animate the models, click on Jmol (lower right corner below the molecule) to open Jmol's menu. Choose Animation, then Animation mode, and click on Loop. Then choose Animation again, and click Play. You can change the speed of the animation with FPS (frames per second) on the Animation menu. By default, there is a delay at the first and last models.

NMR Experiments Yield Multiple Models
When a macromolecular structure is determined by nuclear magnetic resonance (NMR) in solution, the result is an ensemble of multiple molecular models, each of which is consistent with the experimental data. The results of an NMR experiment are a large number of inter-atomic distance restraints, which are consistent with multiple models. This is in contrast to the result of an X-ray crystallographic experiment, which is a single model that best fits the empirical electron density. (In some cases where the resolution is very high, the model may include alternative positions for some atoms.)

The number of NMR models published depends upon the experiment and is up to the authors, and varies between 2 (e.g. 1cvo) and over 100. The median number of models is 20. (You can search for entries with a specified number or range of models using OCA). The first model in the ensemble has no special significance (see the most representative model).

Meaning of the Variation Between Models
The variation between models in the ensemble can mean either of two things. The variation can represent actual flexibility and thermal motion that occurred during the NMR measurements in solution, typically at room temperature. Alternatively, the variation can simply mean uncertainty in the atomic positions, namely, that an inadequate number of restraints were available to determine the positions of some atoms. Unfortunately, there is nothing comparable to the B value or Temperature value that quantitates the uncertainty of the position of each atom in crystallographic results. Specific NMR relaxation experiments can however be used to measure the dynamics of individual atoms, mainly backbone amide groups, as the relaxation of the NMR signal is indeed dependent on the internal motions of the molecule. When these NMR relaxation data are available, they can be used to determine order parameters, which are strongly correlated with the B values of the crystallographic structures. These can be used to distinguish between intrinsic flexibility and uncertainty due to lack of constraints. When relaxation data is not available, the only way to find out what the meaning of the variation between models is to contact the experimenters who authored the published ensemble of models.

Protein chains commonly have more variation between models at the ends than in the middle. An example is 2yru.

Using appropriate methodologies, it is possible to determine both the average structure and its dynamic movements.

The Most-Representative Model
The most representative model is the model closest to the average model. A server called Olderado reports the most representative model, and enables you to download it separately.

The Minimized Average Model
It is common to average the models from an NMR experiment, but in order for the result to be realistic, it must undergo some energy minimization in order to adjust covalent bond lengths and angles. The result is called a minimized average model. Sometimes, authors publish both the ensemble and the minimized average. For example 2bbm appears to be the minimized average for the ensemble of 21 models in 2bbn, but without reading the original publication or contacting the authors, it is difficult to be sure (since the header of the PDB file does not say).

Reliability of NMR Models
NMR models are more likely to contain major errors than are crystallographic models that have good Resolution and Free R values. See also Quality assessment for molecular models.

Median Size of Published NMR Structures
Solution NMR is unable to determine atomic resolution protein structures for molecules in excess of about 30,000 Daltons. In fact, the median mass of NMR structures published in the Protein Data Bank is about 9 kD, with 90% less than 19 kD. In contrast, the median mass of crystallographically determined structures is 45 kD, with 90% <145 kD.

Alignment of Models
NMR models are typically structurally aligned by the authors before publication. However, there are some exceptions, such as 1qp6, 1dl0, and 1i25, in which the individual models are not aligned. In such cases, one needs to look at individual models in order to understand the molecular structure.

The alignment can affect your perception of the variation between models. For example, calmodulin contains two EF-hands connected by a flexible linker. When calmodulin is not bound to a cognate peptide, the two EF-hands can move relative to each other, flexing the linker. In 1cfc, the N-terminal EF-hands are aligned, but the C-terminal EF-hands are in different orientations. Alternatively, had the C-terminal EF-hands been aligned, then the N-terminal EF-hands would be in variable orientations. And less plausibly, had a short center segment of the flexible linker been aligned, both ends would be in variable orientations.

Another example of two folded domains (zinc fingers) connected by a flexible linker is 1zu1. Again, only one domain can be aligned, and which one is arbitrary.

External Resources

 * Methods for Determining Atomic Structures discussed at the Protein Data Bank